Skip to content

Conversation

WaVEV
Copy link
Collaborator

@WaVEV WaVEV commented Sep 9, 2025

Design Doc

In this PR a unified approach for generating MQL from Django expressions was implemented. The core idea is to centralize the control flow in a base_expression method, which decides whether the expression can be translated into a direct field: value match (index-friendly) or must fall back to $expr. This keeps the logic for wrapping and dispatching in one place, while each lookup/function only defines its own expression-building logic.

This approach also allows mixing direct field: value matches with $expr clauses within the same $match. As a result, multiple $expr entries may coexist alongside index-optimized conditions, depending on the shape of the query.

Most lookups now follow this pattern by simply implementing as_mql_expr (and optionally as_mql_path when a match-based translation is possible). Only a few special cases like Col, Func operators (except the KeyTransform) , and many more, override the base behavior directly. This structure also leaves room for future optimizations (e.g. constant folding) without having to change the overall flow.

Additionally, since MongoDB 6 does not allow nesting $expr inside another $expr, the flow in base_expression ensures that such cases are flattened. In practice, expressions are generated without redundant wrapping, so the final MQL never contains $expr within $expr.

@WaVEV WaVEV force-pushed the lookup-refactor branch 3 times, most recently from 529e0ff to a78f26b Compare September 15, 2025 21:21
@timgraham timgraham changed the title WIP lookup refactor INTPYTHON-751 Make query generation omit $expr unless required Sep 20, 2025
Substr.as_mql = substr
Trim.as_mql = trim("trim")
TruncBase.as_mql = trunc
Cast.as_mql_expr = cast
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the function does not support as_mql_path. It could be added latter if we try to simplify constants expressions

@WaVEV WaVEV marked this pull request as ready for review September 26, 2025 04:52
@WaVEV WaVEV requested review from Jibola and timgraham and removed request for Jibola September 26, 2025 04:52
@WaVEV WaVEV force-pushed the lookup-refactor branch 4 times, most recently from 5827580 to 5b9fa93 Compare October 4, 2025 02:50
Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left my first round of comments. I'm going to review the tests as a second phase of the PR.

Comment on lines 936 to 945
def as_mql_expr(self, compiler, connection):
lhs_mql = process_lhs(self, compiler, connection, as_path=False)
value = process_rhs(self, compiler, connection, as_path=False)
return {"$gte": [lhs_mql, value]}

def as_mql_path(self, compiler, connection):
lhs_mql = process_lhs(self, compiler, connection, as_path=True)
value = process_rhs(self, compiler, connection, as_path=True)
return {lhs_mql: {"$gte": value}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a $gte query in a search.text lookup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To convert a score function into a filter I decided to express the following proposition: score_func(...) > 0.

return self.is_simple_column

@cached_property
def is_simple_column(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this validation for embedded models used in multiple places. Can you consolidate this to the query_utils file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 will try. The overall structure of the function is similar, but the type varies. I could create a meta-function that generates the appropriate function for a given type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get better. 😬 . I will let as it is. there is some details like:
previous._field.column, previous.key_name when extract the field_name or the path. Then are different from EMFA and EMF the return, one has to validate the inner transform while the other doesn't



def valid_path_key_name(key_name):
return bool(re.fullmatch(r"[A-Za-z0-9_]+", key_name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.mongodb.com/docs/manual/core/dot-dollar-considerations/

Values like hashtags are also valid for path names and don't require $expr. To my knowledge so long as it's not (.) or ($) it's good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will adjust the expression.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there is some emoji or some non ascii character? 🤔

Comment on lines 211 to 215
def as_mql_expr(self, compiler, connection):
columns, parent_field = self._get_target_path()
mql = parent_field.as_mql(compiler, connection)
for key in columns:
mql = {"$getField": {"input": mql, "field": key}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially out of scope:
https://github.com/mongodb/django-mongodb-backend/pull/392/files#diff-0a6ce30a131a00fa88086c4c4d0d6e6232845fd11ef2bc67891fdf92e10c3743R18-R45

Is it possible to still remove $getField in as_mql_expr or is it expected that routing to as_mql_expr for embedded model queries is because of needing a getField call?

Comment on lines 141 to 147
@property
def can_use_path(self):
simple_column = getattr(self.lhs, "is_simple_column", False)
constant_value = is_constant_value(self.rhs)
return simple_column and constant_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

self.assertAggregateQuery(
query,
"model_fields__nullableintegerarraymodel",
[{"$match": {"field": {"$in": ([1], [2])}}}],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does $in now expect a tuple?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it wasn’t a new convention or expectation. There was already a test that checks the RHS $in as a tuple, so I just followed that convention. It doesn’t affect the query behavior.

"$match": {
"$expr": {
"$eq": [
{"$getField": {"input": "$data", "field": "integer_"}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of a value that could get rid of the getField.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could get rid of those getField if we use $data.integer_ but I thought it was out of scope for this refactor. This behavior is the current behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's out of scope!

[
{
"$match": {
"$expr": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to the callback chain on this one since the null-check could actually be converted.

May best be an improvement added later

Comment on lines +179 to +180
if not valid_path_key_name(previous._field.column):
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this could be described with a comment. What's an example of an invalid path key name? And valid_path_key_name could use a comment or docstring. I guess "user.address.city" is a path and "user", "address", etc. are considered "keys"?

Copy link
Collaborator Author

@WaVEV WaVEV Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A json field could have some rare string as key fields. This test shows the case. Yes I called key the things between dots. 🤔 don't know if there is a better name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is EmbeddedModelTransform... not JSONField's KeyTranform. Is this tested? Tried model_fields_ but all passed with assert False above the return False.

Copy link
Collaborator Author

@WaVEV WaVEV Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I thought I added a test. I forgot to add, just we could define a column like:
wifi_column = models.IntegerField(db_column="$$Wifi").
And now I am thinking of the other models and lookups. will they work If the user start to define columns like that?

But maybe this analysis is out of scope of this ticket 😬

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without tests, I'm not sure if there could be other issues. We should ask the team if we want to support it. It seems unlikely based on the discouragements described at https://www.mongodb.com/docs/manual/core/dot-dollar-considerations/#field-names-with-periods-and-dollar-signs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added: test_query_price_column. (should be polished but it shows the problem)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants